The Joint Optimization of Spectro-Temporal Features and Neural Net Classifiers

نویسندگان

  • György Kovács
  • László Tóth
چکیده

In speech recognition, spectro-temporal feature extraction and the training of the acoustical model are usually performed separately. To improve recognition performance, we present a combined model which allows the training of the feature extraction filters along with a neural net classifier. Besides expecting that this joint training will result in a better recognition performance, we also expect that such a neural net can generate coefficients for spectro-temporal filters and also enhance preexisting ones, such as those obtained with the twodimensional Discrete Cosine Transform (2D DCT) and Gabor filters. We tested these assumptions on the TIMIT phone recognition task. The results show that while the initialization based on the 2D DCT or Gabor coefficients is better in some cases than with simple random initialization, the joint model in practice always outperforms the standard two-step method. Furthermore, the results can be significantly improved by using a convolutional version of the network.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition

In speech recognition, feature extraction and acoustical model training are traditionally done in two separate steps. Here, instead, we use a framework that combines spectro-temporal feature extraction and the training of neural network based acoustic models into a single process. We found earlier that this approach can be successfully applied for the recognition of speech. In this paper, we pr...

متن کامل

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling

Unvoiced stops are rapidly varying sounds with acoustic cues to place identity linked to the temporal dynamics. Neurophysiological studies have indicated the importance of joint spectro-temporal processing in the human perception of stops. In this study, two distinct approaches to modeling the spectro-temporal envelope of unvoiced stop phone segments are investigated with a view to obtaining a ...

متن کامل

An Optimal EEG-based Emotion Recognition Algorithm Using Gabor Features

Feature extraction and accurate classification of the emotion-related EEG-characteristics have a key role in success of emotion recognition systems. In this paper, an optimal EEG-based emotion recognition algorithm based on spectral features and neural network classifiers is proposed. In this algorithm, spectral, spatial and temporal features are selected from the emotion-related EEG signals by...

متن کامل

Hooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition

Spectro-temporal filtering has been shown to result in features that can help to increase the robustness of automatic speech recognition (ASR) in the past. We replace the spectro-temporal representation used in previous work with spectrograms that incorporate knowledge about the signal processing of the human auditory system and which are derived from Power-Normalized Cepstral Coefficients (PNC...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013